Overview

Dataset statistics

Number of variables20
Number of observations4410
Missing cells28
Missing cells (%)< 0.1%
Duplicate rows2912
Duplicate rows (%)66.0%
Total size in memory689.2 KiB
Average record size in memory160.0 B

Variable types

NUM10
CAT9
BOOL1

Reproduction

Analysis started2020-07-25 02:14:49.681573
Analysis finished2020-07-25 02:15:28.705953
Duration39.02 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 2912 (66.0%) duplicate rows Duplicates
NumCompaniesWorked has 586 (13.3%) zeros Zeros
TrainingTimesLastYear has 162 (3.7%) zeros Zeros
YearsAtCompany has 132 (3.0%) zeros Zeros
YearsSinceLastPromotion has 1743 (39.5%) zeros Zeros
YearsWithCurrManager has 789 (17.9%) zeros Zeros

Variables

Age
Real number (ℝ≥0)

Distinct count43
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.923809523809524
Minimum18
Maximum60
Zeros0
Zeros (%)0.0%
Memory size34.5 KiB
2020-07-25T07:45:29.239102image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile24
Q130
median36
Q343
95-th percentile54
Maximum60
Range42
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.133301271
Coefficient of variation (CV)0.2473553349
Kurtosis-0.4059505398
Mean36.92380952
Median Absolute Deviation (MAD)6
Skewness0.4130049527
Sum162834
Variance83.41719211
2020-07-25T07:45:29.437486image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
352345.3%
 
342315.2%
 
362074.7%
 
312074.7%
 
292044.6%
 
321834.1%
 
301804.1%
 
381743.9%
 
331743.9%
 
401713.9%
 
Other values (33)244555.4%
 
ValueCountFrequency (%) 
18240.5%
 
19270.6%
 
20330.7%
 
21390.9%
 
22481.1%
 
ValueCountFrequency (%) 
60150.3%
 
59300.7%
 
58421.0%
 
57120.3%
 
56421.0%
 

Attrition
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size34.5 KiB
0
3699
1
 
711
ValueCountFrequency (%) 
0369983.9%
 
171116.1%
 

BusinessTravel
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size34.5 KiB
Travel_Rarely
3129
Travel_Frequently
831
Non-Travel
 
450
ValueCountFrequency (%) 
Travel_Rarely312971.0%
 
Travel_Frequently83118.8%
 
Non-Travel45010.2%
 
2020-07-25T07:45:29.734365image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length17
Median length13
Mean length13.44761905
Min length10

Department
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size34.5 KiB
Research & Development
2883
Sales
1338
Human Resources
 
189
ValueCountFrequency (%) 
Research & Development288365.4%
 
Sales133830.3%
 
Human Resources1894.3%
 
2020-07-25T07:45:30.034812image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length22
Median length22
Mean length16.54217687
Min length5

DistanceFromHome
Real number (ℝ≥0)

Distinct count29
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.19251700680272
Minimum1
Maximum29
Zeros0
Zeros (%)0.0%
Memory size34.5 KiB
2020-07-25T07:45:30.250657image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median7
Q314
95-th percentile26
Maximum29
Range28
Interquartile range (IQR)12

Descriptive statistics

Standard deviation8.105025519
Coefficient of variation (CV)0.8816981805
Kurtosis-0.2270453549
Mean9.192517007
Median Absolute Deviation (MAD)5
Skewness0.9574657464
Sum40539
Variance65.69143866
2020-07-25T07:45:30.445019image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
263314.4%
 
162414.1%
 
102585.9%
 
92555.8%
 
72525.7%
 
32525.7%
 
82405.4%
 
51954.4%
 
41924.4%
 
61774.0%
 
Other values (19)133230.2%
 
ValueCountFrequency (%) 
162414.1%
 
263314.4%
 
32525.7%
 
41924.4%
 
51954.4%
 
ValueCountFrequency (%) 
29811.8%
 
28691.6%
 
27360.8%
 
26751.7%
 
25751.7%
 

Education
Categorical

Distinct count5
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size34.5 KiB
Bachelor
1716
Master
1194
College
846
Below College
510
Doctor
 
144
ValueCountFrequency (%) 
Bachelor171638.9%
 
Master119427.1%
 
College84619.2%
 
Below College51011.6%
 
Doctor1443.3%
 
2020-07-25T07:45:30.740653image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length13
Median length8
Mean length7.779591837
Min length6

EducationField
Categorical

Distinct count6
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size34.5 KiB
Life Sciences
1818
Medical
1392
Marketing
477
Technical Degree
396
Other
 
246
ValueCountFrequency (%) 
Life Sciences181841.2%
 
Medical139231.6%
 
Marketing47710.8%
 
Technical Degree3969.0%
 
Other2465.6%
 
Human Resources811.8%
 
2020-07-25T07:45:31.035947image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length16
Median length13
Mean length10.53333333
Min length5

Gender
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size34.5 KiB
Male
2646
Female
1764
ValueCountFrequency (%) 
Male264660.0%
 
Female176440.0%
 
2020-07-25T07:45:31.331452image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length6
Median length4
Mean length4.8
Min length4

JobLevel
Categorical

Distinct count5
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size34.5 KiB
Low
1629
Medium
1602
High
654
Very High
 
318
Exemplary
 
207
ValueCountFrequency (%) 
Low162936.9%
 
Medium160236.3%
 
High65414.8%
 
Very High3187.2%
 
Exemplary2074.7%
 
2020-07-25T07:45:31.636995image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length9
Median length4
Mean length4.952380952
Min length3

JobRole
Categorical

Distinct count9
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size34.5 KiB
Sales Executive
978
Research Scientist
876
Laboratory Technician
777
Manufacturing Director
435
Healthcare Representative
393
Other values (4)
951
ValueCountFrequency (%) 
Sales Executive97822.2%
 
Research Scientist87619.9%
 
Laboratory Technician77717.6%
 
Manufacturing Director4359.9%
 
Healthcare Representative3938.9%
 
Manager3066.9%
 
Sales Representative2495.6%
 
Research Director2405.4%
 
Human Resources1563.5%
 
2020-07-25T07:45:31.931937image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length25
Median length18
Mean length18.0707483
Min length7

MaritalStatus
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size34.5 KiB
Married
2019
Single
1410
Divorced
981
ValueCountFrequency (%) 
Married201945.8%
 
Single141032.0%
 
Divorced98122.2%
 
2020-07-25T07:45:32.237802image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length8
Median length7
Mean length6.902721088
Min length6

MonthlyIncome
Real number (ℝ≥0)

Distinct count1349
Unique (%)30.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean65029.31292517007
Minimum10090
Maximum199990
Zeros0
Zeros (%)0.0%
Memory size34.5 KiB
2020-07-25T07:45:32.466856image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum10090
5-th percentile20970
Q129110
median49190
Q383800
95-th percentile178560
Maximum199990
Range189900
Interquartile range (IQR)54690

Descriptive statistics

Standard deviation47068.88856
Coefficient of variation (CV)0.7238103317
Kurtosis1.000231855
Mean65029.31293
Median Absolute Deviation (MAD)21990
Skewness1.368884163
Sum286779270
Variance2215480270
2020-07-25T07:45:32.660068image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
23420120.3%
 
6142090.2%
 
2741090.2%
 
2404090.2%
 
2610090.2%
 
2380090.2%
 
5562090.2%
 
3452090.2%
 
6347090.2%
 
2559090.2%
 
Other values (1339)431797.9%
 
ValueCountFrequency (%) 
1009030.1%
 
1051030.1%
 
1052030.1%
 
1081030.1%
 
1091030.1%
 
ValueCountFrequency (%) 
19999030.1%
 
19973030.1%
 
19943030.1%
 
19926030.1%
 
19859030.1%
 

NumCompaniesWorked
Real number (ℝ≥0)

ZEROS

Distinct count10
Unique (%)0.2%
Missing19
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean2.6948303347756775
Minimum0.0
Maximum9.0
Zeros586
Zeros (%)13.3%
Memory size34.5 KiB
2020-07-25T07:45:32.876431image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34
95-th percentile8
Maximum9
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.498886889
Coefficient of variation (CV)0.9272891345
Kurtosis0.007287480878
Mean2.694830335
Median Absolute Deviation (MAD)1
Skewness1.026766676
Sum11833
Variance6.244435683
2020-07-25T07:45:33.087374image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1155835.3%
 
058613.3%
 
347410.7%
 
24389.9%
 
44159.4%
 
72225.0%
 
62084.7%
 
51874.2%
 
91563.5%
 
81473.3%
 
(Missing)190.4%
 
ValueCountFrequency (%) 
058613.3%
 
1155835.3%
 
24389.9%
 
347410.7%
 
44159.4%
 
ValueCountFrequency (%) 
91563.5%
 
81473.3%
 
72225.0%
 
62084.7%
 
51874.2%
 

PercentSalaryHike
Real number (ℝ≥0)

Distinct count15
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.209523809523809
Minimum11
Maximum25
Zeros0
Zeros (%)0.0%
Memory size34.5 KiB
2020-07-25T07:45:33.298717image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile11
Q112
median14
Q318
95-th percentile22
Maximum25
Range14
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.659107516
Coefficient of variation (CV)0.2405800183
Kurtosis-0.3026383931
Mean15.20952381
Median Absolute Deviation (MAD)2
Skewness0.8205689838
Sum67074
Variance13.38906782
2020-07-25T07:45:33.497918image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1163014.3%
 
1362714.2%
 
1460313.7%
 
1259413.5%
 
153036.9%
 
182676.1%
 
172465.6%
 
162345.3%
 
192285.2%
 
221683.8%
 
Other values (5)51011.6%
 
ValueCountFrequency (%) 
1163014.3%
 
1259413.5%
 
1362714.2%
 
1460313.7%
 
153036.9%
 
ValueCountFrequency (%) 
25541.2%
 
24631.4%
 
23841.9%
 
221683.8%
 
211443.3%
 

StockOptionLevel
Categorical

Distinct count4
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size34.5 KiB
0
1893
1
1788
2
474
3
 
255
ValueCountFrequency (%) 
0189342.9%
 
1178840.5%
 
247410.7%
 
32555.8%
 
2020-07-25T07:45:33.773868image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

TotalWorkingYears
Real number (ℝ≥0)

Distinct count40
Unique (%)0.9%
Missing9
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean11.279936378095888
Minimum0.0
Maximum40.0
Zeros33
Zeros (%)0.7%
Memory size34.5 KiB
2020-07-25T07:45:33.972484image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q16
median10
Q315
95-th percentile28
Maximum40
Range40
Interquartile range (IQR)9

Descriptive statistics

Standard deviation7.782222141
Coefficient of variation (CV)0.6899172017
Kurtosis0.9129359961
Mean11.27993638
Median Absolute Deviation (MAD)4
Skewness1.116831796
Sum49643
Variance60.56298145
2020-07-25T07:45:34.183496image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1060513.7%
 
63758.5%
 
83077.0%
 
92876.5%
 
52646.0%
 
72435.5%
 
12425.5%
 
41894.3%
 
121443.3%
 
31262.9%
 
Other values (30)161936.7%
 
ValueCountFrequency (%) 
0330.7%
 
12425.5%
 
2932.1%
 
31262.9%
 
41894.3%
 
ValueCountFrequency (%) 
4060.1%
 
3830.1%
 
37120.3%
 
36180.4%
 
3590.2%
 

TrainingTimesLastYear
Real number (ℝ≥0)

ZEROS

Distinct count7
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.7993197278911564
Minimum0
Maximum6
Zeros162
Zeros (%)3.7%
Memory size34.5 KiB
2020-07-25T07:45:34.416443image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q33
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.28897817
Coefficient of variation (CV)0.4604612174
Kurtosis0.4911489985
Mean2.799319728
Median Absolute Deviation (MAD)1
Skewness0.5527476257
Sum12345
Variance1.661464722
2020-07-25T07:45:34.621676image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2164137.2%
 
3147333.4%
 
43698.4%
 
53578.1%
 
12134.8%
 
61954.4%
 
01623.7%
 
ValueCountFrequency (%) 
01623.7%
 
12134.8%
 
2164137.2%
 
3147333.4%
 
43698.4%
 
ValueCountFrequency (%) 
61954.4%
 
53578.1%
 
43698.4%
 
3147333.4%
 
2164137.2%
 

YearsAtCompany
Real number (ℝ≥0)

ZEROS

Distinct count37
Unique (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.0081632653061225
Minimum0
Maximum40
Zeros132
Zeros (%)3.0%
Memory size34.5 KiB
2020-07-25T07:45:34.842267image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q13
median5
Q39
95-th percentile20
Maximum40
Range40
Interquartile range (IQR)6

Descriptive statistics

Standard deviation6.125135445
Coefficient of variation (CV)0.8740001072
Kurtosis3.923864205
Mean7.008163265
Median Absolute Deviation (MAD)3
Skewness1.763328232
Sum30906
Variance37.51728422
2020-07-25T07:45:35.052979image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
558813.3%
 
151311.6%
 
33848.7%
 
23818.6%
 
103608.2%
 
43307.5%
 
72706.1%
 
92465.6%
 
82405.4%
 
62285.2%
 
Other values (27)87019.7%
 
ValueCountFrequency (%) 
01323.0%
 
151311.6%
 
23818.6%
 
33848.7%
 
43307.5%
 
ValueCountFrequency (%) 
4030.1%
 
3730.1%
 
3660.1%
 
3430.1%
 
33150.3%
 

YearsSinceLastPromotion
Real number (ℝ≥0)

ZEROS

Distinct count16
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.1877551020408164
Minimum0
Maximum15
Zeros1743
Zeros (%)39.5%
Memory size34.5 KiB
2020-07-25T07:45:35.276917image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile9
Maximum15
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.221699321
Coefficient of variation (CV)1.4726051
Kurtosis3.601760518
Mean2.187755102
Median Absolute Deviation (MAD)1
Skewness1.982939156
Sum9648
Variance10.37934651
2020-07-25T07:45:35.497079image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0174339.5%
 
1107124.3%
 
247710.8%
 
72285.2%
 
41834.1%
 
31563.5%
 
51353.1%
 
6962.2%
 
11721.6%
 
8541.2%
 
Other values (6)1954.4%
 
ValueCountFrequency (%) 
0174339.5%
 
1107124.3%
 
247710.8%
 
31563.5%
 
41834.1%
 
ValueCountFrequency (%) 
15390.9%
 
14270.6%
 
13300.7%
 
12300.7%
 
11721.6%
 

YearsWithCurrManager
Real number (ℝ≥0)

ZEROS

Distinct count18
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.12312925170068
Minimum0
Maximum17
Zeros789
Zeros (%)17.9%
Memory size34.5 KiB
2020-07-25T07:45:35.706803image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median3
Q37
95-th percentile10
Maximum17
Range17
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.567326744
Coefficient of variation (CV)0.8651988638
Kurtosis0.1679485428
Mean4.123129252
Median Absolute Deviation (MAD)3
Skewness0.8328836111
Sum18183
Variance12.7258201
2020-07-25T07:45:35.921999image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2103223.4%
 
078917.9%
 
764814.7%
 
34269.7%
 
83217.3%
 
42946.7%
 
12285.2%
 
91924.4%
 
5932.1%
 
6872.0%
 
Other values (8)3006.8%
 
ValueCountFrequency (%) 
078917.9%
 
12285.2%
 
2103223.4%
 
34269.7%
 
42946.7%
 
ValueCountFrequency (%) 
17210.5%
 
1660.1%
 
15150.3%
 
14150.3%
 
13421.0%
 

Interactions

2020-07-25T07:44:58.788547image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:44:59.163792image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:44:59.414084image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:44:59.890037image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:00.110412image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:00.349339image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:00.602811image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:00.856493image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:01.108667image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:01.380254image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:01.621800image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:01.861647image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:02.129416image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:02.408443image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:02.653852image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:02.908699image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:03.168876image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:03.419604image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:03.680079image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:03.938327image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:04.198483image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:04.453800image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:04.718262image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:04.993478image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:05.246131image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:05.514111image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:05.786419image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:06.050034image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:06.323017image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:06.591234image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:06.859639image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:07.220856image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:07.452252image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:07.696514image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:07.920603image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:08.155023image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:08.401422image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:08.633504image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:08.878968image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:09.116726image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:09.363362image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:09.611663image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:09.885278image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:10.165910image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:10.405522image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:10.661125image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:10.930245image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:11.206942image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:11.526837image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:11.784086image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:12.045203image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:12.292778image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:12.557475image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:12.833599image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:13.088219image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:13.358043image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:13.633248image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:13.949998image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:14.300571image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:14.662019image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:14.943619image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:15.174609image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:15.434956image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:15.730253image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:15.968641image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:16.221957image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:16.709714image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:16.960512image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:17.221544image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:17.492690image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:17.750651image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:18.004139image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:18.269811image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:18.546767image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:18.798313image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:19.069445image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:19.346301image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:19.610772image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:19.888969image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:20.161264image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:20.438802image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:20.680587image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:20.939597image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:21.206412image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:21.450834image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:21.709910image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:21.982947image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:22.238816image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:22.512772image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:22.775983image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:23.045886image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:23.292324image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:23.555363image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:23.825566image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:24.075181image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:24.337656image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:24.610791image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:24.913242image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:25.287097image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:25.564076image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Correlations

2020-07-25T07:45:36.162395image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-07-25T07:45:36.604536image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-07-25T07:45:37.050276image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-07-25T07:45:37.522400image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-07-25T07:45:38.051431image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-07-25T07:45:26.493096image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:27.485864image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:27.989697image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-07-25T07:45:28.244058image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Sample

First rows

AgeAttritionBusinessTravelDepartmentDistanceFromHomeEducationEducationFieldGenderJobLevelJobRoleMaritalStatusMonthlyIncomeNumCompaniesWorkedPercentSalaryHikeStockOptionLevelTotalWorkingYearsTrainingTimesLastYearYearsAtCompanyYearsSinceLastPromotionYearsWithCurrManager
0510Travel_RarelySales6CollegeLife SciencesFemaleLowHealthcare RepresentativeMarried1311601.01101.06100
1311Travel_FrequentlyResearch & Development10Below CollegeLife SciencesFemaleLowResearch ScientistSingle418900.02316.03514
2320Travel_FrequentlyResearch & Development17MasterOtherMaleVery HighSales ExecutiveMarried1932801.01535.02503
3380Non-TravelResearch & Development2DoctorLife SciencesMaleHighHuman ResourcesMarried832103.011313.05875
4320Travel_RarelyResearch & Development10Below CollegeMedicalMaleLowSales ExecutiveSingle234204.01229.02604
5460Travel_RarelyResearch & Development8BachelorLife SciencesFemaleVery HighResearch DirectorMarried407103.013028.05777
6281Travel_RarelyResearch & Development11CollegeMedicalMaleMediumSales ExecutiveSingle581302.02015.02000
7290Travel_RarelyResearch & Development18BachelorLife SciencesMaleMediumSales ExecutiveMarried314302.022310.02000
8310Travel_RarelyResearch & Development1BachelorLife SciencesMaleHighLaboratory TechnicianMarried204400.021010.02978
9250Non-TravelResearch & Development7MasterMedicalFemaleVery HighLaboratory TechnicianDivorced1346401.01316.02615

Last rows

AgeAttritionBusinessTravelDepartmentDistanceFromHomeEducationEducationFieldGenderJobLevelJobRoleMaritalStatusMonthlyIncomeNumCompaniesWorkedPercentSalaryHikeStockOptionLevelTotalWorkingYearsTrainingTimesLastYearYearsAtCompanyYearsSinceLastPromotionYearsWithCurrManager
4400370Travel_RarelyResearch & Development22DoctorMedicalFemaleMediumManufacturing DirectorMarried305502.014317.03302
4401450Travel_FrequentlySales21Below CollegeMarketingMaleHighResearch ScientistMarried228904.01309.03302
4402371Travel_FrequentlySales2BachelorMarketingMaleLowLaboratory TechnicianDivorced400106.011117.02100
4403390Travel_FrequentlyResearch & Development22BachelorMedicalFemaleLowManufacturing DirectorSingle1296500.019120.0219118
4404290Travel_RarelySales4BachelorOtherFemaleMediumHuman ResourcesSingle353901.01806.02615
4405420Travel_RarelyResearch & Development5MasterMedicalFemaleLowResearch ScientistSingle602903.017110.05302
4406290Travel_RarelyResearch & Development2MasterMedicalMaleLowLaboratory TechnicianDivorced267902.015010.02302
4407250Travel_RarelyResearch & Development25CollegeLife SciencesMaleMediumSales ExecutiveMarried370200.02005.04412
4408420Travel_RarelySales18CollegeMedicalMaleLowLaboratory TechnicianDivorced239800.014110.02978
4409400Travel_RarelyResearch & Development28BachelorMedicalMaleMediumLaboratory TechnicianDivorced546800.0120NaN62139

Duplicate rows

Most frequent

AgeAttritionBusinessTravelDepartmentDistanceFromHomeEducationEducationFieldGenderJobLevelJobRoleMaritalStatusMonthlyIncomeNumCompaniesWorkedPercentSalaryHikeStockOptionLevelTotalWorkingYearsTrainingTimesLastYearYearsAtCompanyYearsSinceLastPromotionYearsWithCurrManagercount
0180Non-TravelResearch & Development1MasterMedicalMaleMediumSales ExecutiveSingle272001.02210.020003
1180Non-TravelResearch & Development2BachelorLife SciencesMaleHighSales RepresentativeSingle1860601.02420.040003
2180Non-TravelSales5MasterOtherMaleMediumManagerSingle323001.01210.030003
3180Travel_RarelySales7BachelorLife SciencesMaleLowResearch ScientistSingle381201.01500.030003
4181Non-TravelResearch & Development2MasterMedicalMaleHighLaboratory TechnicianSingle1096501.01800.050003
5181Travel_FrequentlyResearch & Development2BachelorTechnical DegreeMaleLowSales ExecutiveSingle346801.01820.040003
7181Travel_RarelyResearch & Development1MasterLife SciencesMaleLowSales ExecutiveSingle233501.01420.030003
8190Travel_RarelyResearch & Development1BachelorOtherFemaleExemplaryManufacturing DirectorSingle1520201.01831.021003
9190Travel_RarelyResearch & Development23MasterLife SciencesMaleMediumLaboratory TechnicianSingle1919701.01201.021003
10190Travel_RarelySales2MasterMarketingMaleHighLaboratory TechnicianSingle1155701.02201.001013